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We propose examples of a hybrid quantum-classical simulation where a classical computer assisted 
by a small quantum processor can efficiently simulate a larger quantum system. First we consider 
sparse quantum circuits such that each qubit participates in 0(1) two-qubit gates. It is shown that 
any sparse circuit on n k qubits can be simulated by sparse circuits on n qubits and a classical 
processing that takes time 2^^^^poly(n). Secondly, we study Pauli-based computation (PBC) where 
allowed operations are non-destructive eigenvalue measurements of n-qubit Pauli operators. The 
computation begins by initializing each qubit in the so-called magic state. This model is known 
to be equivalent to the universal quantum computer. We show that any PBC on n k qubits 
can be simulated by PBCs on n qubits and a classical processing that takes time 2^^^^poly(n). 
Finally, we propose a purely classical algorithm that can simulate a PBC on n qubits in a time 
2^^poly(n) where a 0.94. This improves upon the brute-force simulation method which takes 
time 2^poly(n). Our algorithm exploits the fact that n-fold tensor products of magic states admit 
a low-rank decomposition into n-qubit stabilizer states. 


I. INTRODUCTION 

Quantum computers promise a substantial speedup 
over classical ones for certain number-theoretic prob¬ 
lems and the simulation of quantum systems m- 
Experimental efforts to build a quantum computer 
remain in their infancy though, limited to proof-of- 
principle experiments on a handful of qubits. In con¬ 
trast, the design of classical computers is a mature 
field offering billions of operations per second in off- 
the-shelf machines and petafiops in leading supercom¬ 
puters. To prove their worth, quantum computers will 
have to offer computational solutions that rival the 
performance of classical supercomputers, a daunting 
task to be sure. 

Here we study hybrid quantum-classical computa¬ 
tion, wherein a small quantum processor is combined 
with a large-scale classical computer to jointly solve 
a computational task. To motivate this problem, 
imagine that a client can access a quantum computer 
with 100 qubits and essentially perfect quantum gates. 
Such a computer lies in the regime where it is likely 
to outperform any classical machine (since it would 
be nearly impossible to emulate classically). Imagine 
further that the client wants to implement a quantum 
algorithm on 101 qubits, but it is impossible to ex¬ 
pand the hardware to accommodate one extra qubit. 
Does the client have any advantage at all from the 
access to a quantum computer in this scenario? Can 
one divide a quantum algorithm into subroutines that 
require less qubits than the entire algorithm? Can 
one implement each subroutine separately and com¬ 
bine their outputs on a classical computer? These 
are the main questions addressed in the present pa¬ 
per. Put differently, we ask how to add one virtual 
qubit to an existing quantum machine at the cost of 
an increased classical and quantum running times, but 
without modifying the machine hardware. More gen¬ 
erally, one may ask what is the cost of adding k virtual 


qubits to an existing quantum computer of n qubits 
and how to characterize the tradeoff between quantum 
and classical resources in these settings. 

As one may expect, the cost of adding virtual qubits 
varies for different computational models. Although 
the circuit-based model of a quantum computer is 
the most natural and well-studied, several alternative 
models have been proposed, such as the measurement- 
based [4] and the adiabatic [5] quantum computing, 
as well as the model DQCl where most of the qubits 
are initialized in the maximally mixed state [6]. Our 
goal is to identify quantum computing models which 
enable efficient addition of virtual qubits. Below we 
describe two examples of such models. 

We begin with the model based on sparse quantum 
circuits. Recall that a quantum circuit on n qubits is 
a collection of gates, drawn from some fixed (usually 
universal) gate set, with n input qubits and n output 
qubits. Below we assume that the gate set includes 
only one-qubit and two-qubit gates. Let us say that a 
circuit is d-sparse if each qubit participates in at most 
d two-qubit gates. We shall be interested in the regime 
when d is a constant independent of n or when d grows 
very slowly, say d ^ log (n). This regime covers inter¬ 
esting quantum algorithms that can be described by 
low-depth circuits [7] since any depth-d quantum cir¬ 
cuit must be d-sparse (although the converse is gen¬ 
erally not true). It is believed that a constant-depth 
quantum computation cannot be efficiently simulated 
by classical means only BM- It is also likely that early 
applications of quantum computers will be based on 
relatively low-depth circuits because they impose less 
stringent requirements on the qubit coherence times. 

Define a d-sparse quantum computation, or d-SQC, 
as a sequence of the following steps: (i) initialization 
of n qubits in the |0) state, (ii) action of a d-sparse 
quantum circuit, (iii) measurement of each qubit in 
the 0,1 basis, and (iv) classical processing of the mea¬ 
surement outcomes that returns a single output bit 
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bout- We require that the final classical processing 
takes time at most poly{n). A classical or quantum 
algorithm is said to simulate a d-SQC if it computes 
probability of the output bout = 1 with a small addi¬ 
tive error. Our first result is the following theorem, 
which quantifies the cost of adding k virtual qubits to 
a d-SQC on n qubits. 

Theorem 1. Suppose n > kd-\-l. Then any d-sparse 
quantum eomputation onn^k qubits ean be simulated 
by a {d^ 3)-sparse quantum eomputation on n qubits 
repeated times and a elassieal proeessing whieh 

takes time poly(n). 

The above result is most useful when both k and d 
are small, for example, k = 0(1) and d = O(logn). 
In this case both quantum and classical running time 
of the simulation scale as poly{n). On the other hand, 
we expect that a direct simulation of a d-SQC on a 
classical computer takes a super-polynomial time (see 
the discussion above). Hence the theorem provides an 
example when a hybrid quantum-classical simulation 
is more efficient than a classical simulation alone. 

The proof of the theorem exploits the fact that any 
d-sparse quantum circuit U acting on a bipartite sys¬ 
tem AB with \A\ ^ k and \B\ ^ n can be decom¬ 
posed into a linear combination of tensor prod¬ 

uct terms Va 0 Wq,, where Va and Wa are d-sparse 
circuits acting on A and B respectively. We show 
that the task of simulating U can be reduced to sim¬ 
ulating the smaller circuits Wa, as well as computing 
certain interference terms that involve pairs of circuits 
Wa, Wg. We show that the interference terms can be 
estimated by a simple SWAP test which can be real¬ 
ized by a (d + 3)-sparse computation on n qubits. 

Our second model is called Pauli-based computa¬ 
tion (PBC). We begin with a formal definition of the 
model. Let be the set of all hermitian Pauli op¬ 
erators on n qubits, that is, n-fold tensor products of 
single-qubit Pauli operators /, X, T, Z with the over¬ 
all phase factor ±1. A PBC on n qubits is defined 
as a sequence of elementary steps labeled by integers 
t = 1,..., n where at each step t one performs a non¬ 
destructive eigenvalue measurement of some Pauli op¬ 
erator Pf G . Let (Tt be the measured eigenvalue of 
Pt. Note that at = ±1 since any element of squares 
to one. We allow the choice of Pt to be adaptive, that 
is, Pt may depend on all previously measured eigen¬ 
values (Ji,..., at-i. The latter have to be stored in a 
classical memory. The computation begins by initial¬ 
izing each qubit in the so-called magic state 

\H) = cos (7r/8)|0) + sin (7r/8)|l). 

Once all Pauli operators Pi,..., have been mea¬ 
sured, the final quantum state is discarded and one 
is left with a list of measured eigenvalues ai,..., cr^. 
The outcome of a PBC is a single classical bit bout ob¬ 
tained by performing a classical processing of the mea¬ 
sured eigenvalues. All classical processing must take 


time at most poly{n). We shall prove that the com¬ 
putational power of a PBC does not change if one ad¬ 
ditionally requires that all Pauli operators Pi,..., P^ 
pairwise commute (for all measurement outcomes). A 
classical or quantum algorithm is said to simulate a 
PBC if it computes probability of the output bout = 1 
with a small additive error. An example of a PBC is 
shown at Fig. 

The PBC model naturally appears in fault-tolerant 
quantum computing schemes based on error correcting 
codes of stabilizer type m- Such codes enable a sim¬ 
ple fault-tolerant implementation of non-destructive 
Pauli measurements on encoded qubits, for example 
using the Steane method m- Furthermore, topo¬ 
logical quantum codes such as the surface code en¬ 
able a direct measurement of certain logical Pauli 
operators by measuring a properly chosen subset of 
physical qubits m- Several fault-tolerant protocols 
for preparing encoded magic states such as \H) have 
been developed UMI]. PBCs implicitly appeared in 
the previous work on quantum fault-tolerance. Our 
analysis closely follows the work by Campbell and 
Brown [18] who showed that a certain class of magic 
state distillation protocols can be implemented by 
PBCs. 

Let us now state our results. First, we claim that a 
PBC has the same computational power as the stan¬ 
dard circuit-based quantum computing model. 

Theorem 2. Any quantum eomputation in the 
eireuit-based model with n qubits and poly{n) gates 
drawn from the Clifford+T set ean be simulated by a 
PBC on m qubits, where m is the number ofT gates, 
and poly{n) elassieal proeessing. 

Recall that the Clifford+T gate set consists of 
single-qubit gates 
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and the two-qubit CNOT gate. This gate set is known 
to be universal for quantum computing. Secondly, we 
show that PBCs enable efficient addition of virtual 
qubits. 

Theorem 3. A PBC on n P k qubits ean be simu¬ 
lated by a PBC on n qubits repeated 2^^^^ times and 
a elassieal proeessing whieh takes time 2^^^^poly{n). 

Both theorems follow from the fact that a gener¬ 
alized PBC that incorporates unitary Clifford gates, 
ancillary stabilizer states (such as |0) or |+)), and has 
poly{n) measurements can be efficiently simulated by 
the standard PBC defined above. To prove Theorem 
we convert a given quantum circuit on n qubits with 
m T-gates into a generalized PBC on n + m qubits 
initialized in the state. Each T-gate 

of the circuit is converted into a simple gadget that 
includes adaptive Pauli measurements and consumes 
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one copy of the \H) state. Simulating such general¬ 
ized PBC by the standard PBC on m qubits proves 
Theorem |2l 

To prove Theorem we represent k copies of the 
magic state \H) as a linear combination of /c-qubit sta¬ 
bilizer states (pa such that 

for some real coefficients Ca- The number of terms in 
this sum is We carry out the simulation inde¬ 

pendently for each a using a generalized PBC on k-\-n 
qubits initialized in the state Ipa) ^ \H^^) and com¬ 
bine the outcomes on a classical computer. Finally, we 
simulate the generalized PBCs by the standard PBCs 
on n qubits. 

Perhaps more surprisingly, we prove that PBCs can 
be simulated on a classical computer alone more ef¬ 
ficiently than one could expect naively. Let us first 
describe a brute-force simulation method based on 
the matrix-vector multiplication. Let (pt be the n- 
qubit state obtained after measuring the Pauli op¬ 
erators Pi,... ^Pf. One can store pt in a classical 
memory as a complex vector of size 2^. Each step 
of a PBC involves a transformation pt Pt-\-i where 
pt-\-i = -\-atPt)pt- Since Pt is a Pauli operator, 

the matrix of Pt in the standard basis is a permuta¬ 
tion matrix modulo phase factors. Thus, for a fixed 
vector pt, one can compute pt-\-i for both choices of 
at in time 0(2’^). Furthermore, one can compute the 
norm of pt-\-i in time 0(2'^) and thus determine the 
probability of each measurement outcome at. By flip¬ 
ping a classical coin one can generate a random vari¬ 
able at = PI with the desired probability distribution. 
Since any PBC has at most n steps, the overall cost 
of the classical simulation is 0{n2'^). Below we show 
that this brute force simulation method is not optimal. 

Theorem 4. Any PBC on n qubits can be simulated 
classically in time 2^^poly(n), where a « 0.94. 

Our simulation algorithm exploits the fact that ten¬ 
sor products of magic states admit a low-rank decom¬ 
position into stabilizer states. Recall that an n-qubit 
state p is called a stabilizer state if \p) = U\0^^) for 
some n-qubit Clifford operator U — a product of the 
elementary gates H, S, and the CNOT. 

Suppose p is an arbitrary n-qubit state. Define a 
stabilizer rank of p as the smallest integer y such that 
p can be written as \p) = where c^ 

are complex coefficients and pa are n-qubit stabilizer 
states. The stabilizer rank of p will be denoted xW- 
By definition, 1 < xW P ^-qubit state p 

and x('0) = 1 iff is a stabilizer state. For example, 
the magic state \H) has stabilizer rank x{H) = 2, 
since \H) is not a stabilizer state itself, but it can be 
written as a linear combination of two stabilizer states 
|0) and |1). Furthermore, using the identity 

^ 1 (| 00 ) + | 11 )) + ^(| 00 ) + | 01 ) + | 10 )-| 11 )) 


one can easily check that = 2. More gen¬ 
erally, let Xn be the stabilizer rank of Note 

that Xn+m < XnXm siucc a tensor product of two 
stabilizer states is a stabilizer state. In particular, 
Xn < = 2 ”/ 2 . 

The probability to observe measurement outcomes 
ai,... ,at in a PBC implemented up to a step t can 
be written as 

a,/3=l 

where pa are n-qubit stabilizer states, Ca are complex 
coefficients, and 11 = ^aPa)l‘^ is the pro¬ 

jector describing the partially implemented PBC. We 
will use a version of the Gottesman-Knill theorem m 
to show that each term {pa\Il\py) can be computed 
on a classical computer in time n^. Since the number 
of terms is Xn number of steps is at most n, 

we would be able to simulate a PBC on n qubits clas¬ 
sically in time (Xn)^^^- Improving upon the brute- 
force simulation method thus requires an upper bound 
Xn P for some [3 < 1/2. We establish such an up¬ 
per bound with p = log2(7)/6 « 0.468 by showing 
that X6 < 7 which implies Xn < 
expect that the scaling in Theorem]^ can be improved 
by computing Xn for larger values of n. In Appendix B 
we describe a heuristic algorithm for computing low- 
rank decompositions of into stabilizer states 

which yields the following upper bounds: 
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We believe that these upper bounds are tight. A lower 
bound Xn P is proved in Appendix C. 

II. DISCUSSION AND PREVIOUS WORK 

Classical algorithms for simulation of quantum cir¬ 
cuits based on the stabilizer formalism have a long 
history. Notably, Aaronson and Gottesman m stud¬ 
ied adaptive quantum circuits that contain only a few 
non-Glifford gates. Assuming that a circuit contains 
at most m non-Clifford gates and that all n qubits are 
initially prepared in some stabilizer state. Ref. m 
showed how to simulate such a circuit classically in 
time 2^^poly{n). To enable a comparison with our 
results, assume that all unitary gates belong to the 
Clifford+T set. By Theorem a quantum circuit 
as above can be transformed into a PBC on m qubits, 
where m is the number of T-gates. Thus Theorems |2|4| 
provide a classical simulation algorithm with a run¬ 
ning time 2^-^^^poly{n) which improves upon [19]. 
In addition. Ref. m studied adaptive quantum cir¬ 
cuits composed only of Clifford gates and Pauli mea¬ 
surements with more general initial states. Assum¬ 
ing that the initial n-qubit state can be written as 
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FIG. 1. Example of a PBC on n = 3 qubits. Each step t 
involves an eigenvalue measurement of a Pauli operator Pt 
on n qubits with an outcome at = zb 1. A choice of Pt may 
depend on the outcomes of all previous measurements. A 
PBC on n qubits can be described by a binary tree T 
of height n such that internal nodes of T are labeled by 
n-qubit Pauli operators and leaves of T are labeled by 
0 and 1. The latter represent the final output bit bout- 
We require that label of any node of T can be computed 
classically in time poly{n). 


a tensor product of some 6 -qubit states, a quantum 
circuit as above can be simulated classically in time 
where d is the total number of mea¬ 
surements m- 

Methods for decomposing arbitrary states into a lin¬ 
ear combination of stabilizer states aimed at simula¬ 
tion of quantum circuits were pioneered by Garcia, 
Markov, and Cross pniiii] who studied decomposi¬ 
tions into pairwise orthogonal stabilizer states (named 
stabilizer frames). The latter are more restrictive than 
the general decompositions analyzed in the present 
paper. Furthermore, Refs. EOIIII] have not studied 
stabilizer decompositions of magic states. 

The simulation algorithm of Theorem is concep¬ 
tually close to the matrix multiplication algorithms 
based on tensor decompositions [22l [23] . In this case 
the analogue of a stabilizer state is a product state and 
the analogue of a magic state is a tripartite entangled 
state that contains EPR-type states shared between 
each pair of parties, see [24] for details. 

Efficient classical algorithms for simulation of quan¬ 
tum circuits in which the initial state can be described 
by a discrete Wigner function taking non-negative 
values were investigated by Veitch et al [25] and by 
Howard [26] et al. As was pointed out by Pashayan, 
Wallman, and Bartlett EH, such methods can be com¬ 
bined with Monte Carlo sampling techniques to enable 
classical simulation of general quantum circuits with 
the running time scaling exponentially with the quan¬ 
tity related to the negativity of the Wigner function. 
To enable a comparison between Theorem and the 


results of [27] one can employ a discrete Wigner func¬ 
tion representation of stabilizer states and Clifford op¬ 
erations on qubits developed by Delfosse et al [28] , 
The latter is applicable only to states with real am¬ 
plitudes and to Clifford operations that do not mix 
X-type and Z-type Pauli operators (CSS-preserving 
operations). A preliminary analysis shows that com¬ 
bining the results of Refs. EH EH] yields a classical 
algorithm for simulating a restricted class of PBC on 
n qubits in time M‘^^poly{n) ~ 2^'^^^^poly{n)^ where 
M = 2~^ + 2“^/^ ^ 1.207 is the so-called mana of the 
magic state \H){H\, see [27l [28] for details. The re¬ 
striction is that all Pauli operators to be measured are 
either X-type or Z-type, and the measurements can¬ 
not be adaptive. Such restricted PBCs are not known 
to be universal for quantum computation. 

Our method of simulating sparse quantum circuits 
has connections to ideas of tensor network represen¬ 
tations of quantum circuits developed by Markov and 
Shi [29]. Indeed, our proof of Theorem can be in¬ 
terpreted as a particular method of expressing the 
acceptance probability of a quantum computation in 
terms of a contraction of tensors associated with the 
quantum circuit. The individual entries of the tensors 
are then estimated separately with a smaller quantum 
computer and then added together. 

Let us now discuss some open problems and possi¬ 
ble generalizations of our work. A natural question is 
whether the scaling in Theorem [^ can be improved if 
\H) is replaced by some other magic state. By defini¬ 
tion, any magic state is Clifford-equivalent to one of 
the states \H) and |R), where |R) is the +1 eigenvec¬ 
tor of an operator (X + F + see Ref. [13] for 

details. The numerics suggests that and 

have the same stabilizer rank for n < 6 . We con¬ 
jecture that this remains true for all n. Moreover, we 
pose the following conjecture which, if true, highlights 
a new optimality property of magic states in terms of 
their stabilizer rank. 

Conjecture 1. Let Xn be the stabilizer rank of 
and (j) be an arbitrary single-qubit state. Then 

^( 00 n) _ 2 if (j) is a stabilizer state^ 
^( 00 n) = if (j) is a magie state^ 

^( 00 n) ^ otherwise. 

Less formally, the conjecture says that magic states 
have the smallest possible stabilizer rank among all 
non-stabilizer single-qubit states. 

It is also of great interest to understand the asymp¬ 
totic scaling of the stabilizer rank Xn- Assuming that 
a universal quantum computation cannot be simu¬ 
lated classically in polynomial time, one infers that 
Xn must grow super-polynomially in the limit n ^ oo. 
However, we were unable to derive such a lower bound 
directly without using any assumptions. The fact that 
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amplitudes of any stabilizer state in the standard ba¬ 
sis take only 0 ( 1 ) different values implies a weaker 
lower bound Xn ^ see Appendix C. We con¬ 
jecture that in fact Xn > Note that if this 

conjecture is false, that is, Xn ^ then constant- 

depth circuits in the Clifford+T basis can be simu¬ 
lated classically in a sub-exponential time, which ap¬ 
pears unlikely. Indeed, since such a circuit contains 
at most m = 0{n) T-gates, where n is the number of 
qubits. Theorems |2|4| would provide a simulation algo¬ 
rithm with a running time X^'Poly(n) = 2^^^^poly{n). 
(Here we ignore the complexity of finding the optimal 
stabilizer decomposition since it has to be done only 
once for each n.) 

Finally, one may explore generalizations of the sta¬ 
bilizer rank to approximate decompositions into sta¬ 
bilizer states. It should be pointed out that the sim¬ 
ulation algorithm of Theorem would require ap¬ 
proximate stabilizer decompositions with a precision 
at least since the probability of a particular 

measurement outcome ai,..., at can be exponentially 
small in n. It is not clear whether such approxi¬ 
mate decompositions would have a rank substantially 
smaller than the exact ones. 


In the rest of the paper we prove the theorems 
stated in the introduction. From the technical per¬ 
spective, Theorems 1 1 12|3| follow easily from the defini¬ 
tions and from the previously known results. On the 
other hand. Theorem and the notion of a stabilizer 
rank appear to be new. We analyze sparse quantum 
circuits in Section |ml A classical algorithm for simu¬ 
lation of PBCs and the stabilizer rank of magic states 
are discussed in Section [IV| Theorems 2|3 are proved 
in Section [V| Appendix A proves a technical lemma 
needed to compute inner products between stabilizer 
states. Appendix B describes a numerical method of 
computing low-rank stabilizer decompositions. Ap¬ 
pendix C proves a lower bound on the stabilizer rank 
of magic states. 


III. SPARSE QUANTUM CIRCUITS 

In this section we prove Theorem All quantum 
circuits considered below are defined with respect to 
some fixed basis of gates Q. We assume that any gate 
in Q acts on at most two qubits. Furthermore, we 
assume that Q contains all single-qubit Pauli gates 
X, T, Z, their controlled versions, the Hadamard gate, 
and the 7 r /2 phase shift S' = |0)(0|+i|I)(I|. For exam¬ 
ple, Q could be the Clifford+T basis. Let = {0,1}’^ 
be the set of n-bit binary strings. 


Lemma 1. Let U he a d-sparse quantum eireuit on 
k^n qubits. Partition the set of qubits as AB, where 


|A| = k and \B\ = n. Then 

X 

u = x = ( 1 ) 

cx=l 

where Va and Wa are d-sparse quantum eireuit aeting 
on A and B respeetively, and are some eomplex 
eoeffieients sueh that X]a=i ~ 

Proof. Since U is a d-sparse circuit, it contains at most 
kd two-qubit gates that couple some qubit of A and 
some qubit of B. Let Gi,..., Gm be the list of all such 
gates, where m < kd. Any two-qubit gate G[i^j] act¬ 
ing on qubits i G A and j ^ B can be expanded in the 
Pauli basis as G[i, j] = ^ ^a[j]^ where 

Pa G {/, X, T, Z} are Pauli operators and are some 
complex coefficients such that = I. Apply¬ 

ing the above decomposition to each gate Gi,..., Gm 
and, if necessary, appending dummy identity gates to 
make m = kd, one arrives at Eq. 0 . Note that re¬ 
placing a two-qubit gate in U by a tensor product of 
two single-qubit Pauli gates cannot increase the spar¬ 
sity of the circuit. Thus each term Va^Wa is a tensor 
product of two d-sparse circuits. □ 

The classical post-processing step can be described 
by di poly{n) classical circuit / : ^ { 0 ? !}• By 

definition of the SQC model, the final output of a 
computation is a single random bit bout = f{x)^ where 
X G is the bit string obtained by measuring each 

qubit of a state in the 0,1 basis. Let 7r{U) 

be the probability of the output bout = I 5 fbat is, 

7r{U) = (0^+^|U’^nU|0^+^), 


n= E \r){x\. (2) 

a: : f{x) = l 

Let us first show how to estimate the quantity 7 r(U) 
with a small additive error using d/c-sparse circuits on 
n + I qubits. Substituting Eq. 0 into the definition 
of 7r(I/) one gets 


a,/3=l 


where 


Ca(2/) = C„(2/|y„|0''), !</.«) = W^alO"), 

and 


n(2/) = E 

fiyz)=i 


We claim that each coefficient (y) can be computed 
exactly in time 0(/cd • 2 ^). Indeed, we can merge con¬ 
secutive single-qubit gates of W such that each qubit 






6 


is acted upon by at most d two-qubit gates and at 
most d 1 single-qubit gates. Thus we can assume 
that the total number of gates in Va is 0{kd). One 
can compute the quantity (^|Va|0^) classically in time 
0{kd’2^) by performing matrix-vector multiplication 
for each gate of Va- Furthermore, it is clear from the 
proof of Lemma that each coefficient Ca can be com¬ 
puted in time 0{kd). 

Consider some fixed triple {y^a^ P) that appears in 
the sum Eq. (©■ Define a controlled-operator 

A{W) = |0)(0| C) + |1)(1| O W^. 

Define a quantum circuit R acting on n + 1 qubits that 
consists of the following steps: (i) initialize n+1 qubits 
in the |0) state, (ii) apply H gate to the first qubit, (hi) 
apply A{W) with the first qubit acting as the control 
one, (iv) apply H gate to the first qubit, (v) measure 
each qubit in the 0,1-basis. The construction of i?, 
illustrated at Fig. is very similar to the standard 
SWAP test, except that we finally measure each qubit. 
Let 6, ^ be the measurement outcomes, where 6 = 0,1 
and z G see Fig. Define a random variable 
^y,a,y faking values ±1 such that cr'y^a^y = 1 iff 6 = 0 
and f{yz) = 1. Otherwise cr'y^a^y ~ ^ simple 

algebra shows that 

Re(((/.„|n(y)|(/.^))=E(a;,„,^), (4) 

that is cr'y^a^y unbiased estimator of the real part 
of {(j)a\^{y)\(t)f 3 )- We claim that one can get a sample 
of ^ ^ by executing a single instance of a d/c-sparse 
quantum computation on n + 1 qubits (with certain 
special properties). Indeed, by construction, the cir¬ 
cuits Wa and Wy can be obtained from each other by 
changing some subset of at most kd single-qubit Pauli 
gates. Thus the controlled circuit A(1F) only needs 
control for at most kd single-qubit Pauli gates. This 
shows that the control qubit participates in at most 
kd two-qubit gates. Furthermore, since all locations 
where Wa and differ from each other originate 
from two-qubit gates in the initial d-sparse circuit 1/, 
we conclude that the circuit R has a special property 
that all qubits except for the control one participate 
in at most d two-qubit gates. One can similarly define 
a random variable cFy^a.y that 

The only difference is that the H gate in the circuit 
R must be replaced by ids' gate. We conclude that 

Thus the quantity 'k(U) has an unbiased estimator 

X _ 

^ = E E K,a,/3 +KU./ 3 ) ’ 

a,/3=l 


that is, 7r(t/) = E(^). Using the bounds \ca{y)\ < \ca\ 
and |caP = 1 one gets 

X 

lei < 2 F E |c«(2/)GM| < 2^=+^ 

a,/3=l 

with probability one. By Hoeffding’s inequality, one 
can estimate E(^) with a small additive error by 
generating samples of ^ for some constant 

c = 0(1). Generating each sample of ^ requires 2^x^ 
samples of the cr-variables. Thus one can estimate 
7r(i/) by repeated applications of dd-sparse circuits on 
n + 1 qubits with the number of repetitions scaling as 
_ 2^0(kd) 

Recall that the dd-sparse circuits R constructed 
above have a very special pattern of sparsity. Namely, 
all qubits except for one participate in at most d two- 
qubit gates, whereas one remaining qubit participates 
in at most kd two-qubit gates. We can distribute the 
sparsity more evenly among all n + 1 qubits by per¬ 
forming a swap gate that changes position of the con¬ 
trol qubit after each application of a control gate (this 
is possible only if n is sufficiently large, specifically, if 
n > dd + 1). After this modification one obtains an 
equivalent circuit which is (d + 3)-sparse. 

Finally, we can apply exactly the same arguments 
as above if the subsets A and B in Lemma [T] have 
size |A| = k 1 and \B\ = n — 1. This frees up 
one extra qubit that can play the role of the control 
one in the above construction. Now we can estimate 
7r{U) by repeated applications of (d+3)-sparse circuits 
on n qubits with the number of repetitions scaling as 
^2i6(/c+i)d+3(/c+i) _ 20 {kd) ^ This completes the proof 
of Theorem [H 



FIG. 2. Quantum circuit R used to estimate the real part 
of {(j)a\B{y)\(j)p) in Eq. pj. The final output of the circuit 
is a random variable ^ = ±1 such that cry^a,p — f 
iff 5 = 0 and f{yz) = 1, where / : {0,^ {0,1} 
is the Boolean function describing post-processing step in 
the original circuit U on n k qubits. We construct a 
circuit R as above for each triple (y, a, /3) with y G {0, 1}^ 
and a, /3 = 1,..., y. A simple algebra shows that is 

an unbiased estimator of Iie{{(t)a\B(y)\(t)p)). 
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IV. STABILIZER RANK AND CLASSICAL 
SIMULATION OF PBC 


Lemma 3. The action ofU in the computational ba¬ 
sis can he represented as 


In this section we prove Theorem We begin with 
an algorithm for computing a quantity ('0|n|0), where 
'0,0 are n-qubit stabilizer states and 11 is a projec¬ 
tor onto the codespace of some stabilizer code. We 
note that several previous works addressed the prob¬ 
lem of computing the inner product ('0|0) between 
stabilizer states '0,0. In particular, Aaronson and 
Gottesman m showed that the magnitude |('0|0)|can 
be computed in time 0{n^). Furthermore, Garcia, 
Markov, and Cross m used canonical form of Clif¬ 
ford circuits to compute both the magnitude and the 
phase of (0|0) in time O(n^). Below we describe 
a technically different (and somewhat simpler) algo¬ 
rithm which is more suited for computing the quantity 

(0|n|0) as above. 

Let Zm = {0,1,..., m — 1} be the cyclic group of 
order m. A function f : ¥2 ^ Zg is called a degree- 
two polynomial if 

n 

f{xi,...,Xn)=h + 2'^faXa + ^ fafiXa^b 

a=l l<a<h<n 

where G Zg, fa ^ Z 4 , and fa^b ^ ^2 are some 
constant coefficients. Define 

xe¥'^ 


U\x) = 2-* Yi \x + yA) 

for some degree-two polynomial g : ¥2 ^ Zg and 
some binary matrices A, B of size t x n. 

Proof Given a binary vector / G F 2 , let X{f) G 
be the Pauli operator that applies X to each qubit in 
the support of /. Define Z{f) in a similar fashion. 
Let e^ G be the basis vector which has a single 
T’ at the position k. The k-th generator of Q can be 
written as Pk = i^^ X {e^ A)Z {e^ B) for some G Z4 
and some binary matrices A, B of size t x n. In other 
words, the k-th row of A (of B) specifies the X-part 
(the Z-part) of Pk. Choose any vector ^ G F^. Then 

Piy)= JJ Pk=u<^^y^XiyA)Z{yB), 
k : = l 

where 

^9iy) = j^J2l=iCkyk . (^_i^J2i<k<i<A^^'^)k,iykyi ^ 

Clearly, g : F^ ^ Zg is a degree-two polynomial. 
Thus 

2*''n\x) = Y P{y)\x) = Y \x + yA). 

S/eF* y6F* 


Lemma 2. Let f : ¥2 ^ Zg be a degree-two polyno¬ 
mial. Then either (/) = 0 or (/) = for some 

integer n < p <2n and some m G Zg. Furthermore, 
one can compute (/) in time 0{n^). 

Since the proof is rather straightforward, we post¬ 
pone it until Appendix A. It was shown by Dehaene 
and De Moor [30] and by Van den Nest m that any 
stabilizer state 0 of n-qubit can be written (up to a 
global phase and a normalization) as 

10) = -\- (5) 


for some degree-two polynomial / : F 2 ^ Zg, some 
k X n binary matrix and some vector z G F 2 . Here 
we treat u and z as row vectors. In the rest of this 
section we take Eq. (§ as our definition of a stabilizer 
state. 

Let G C be an abelian group with t independent 
generators Pi, P 2 ,..., Pt G Q. Define a projector H 
onto the ^-invariant subspace. 


□ 


Consider now a pair of n-qubit stabilizer states 0, 0, 
where 0 is defined in Eq. (§ and 




(6) 




Here h : F^ ^ Zg is a degree-two polynomial, ^ is 
a binary matrix of size m x n, a nd z ' G F2 is some 
vector. Using Lemmaand Eqs. (5^) one gets 


in|(/.) = 2-* ^ 


UJ‘ 


h{v)-f{u)-\-g{y) 


u,v,y 




Clearly the non-zero terms are those with 2 ; + = 

z' + -\- yA. We can enforce this equality by intro¬ 

ducing an extra variable x G F 2 such that 

{z+u^\P+v^+yA) = 2-" Y 


U = 2 -^Y P- 

Peg 


Then 

{ip\Il\(j)) = 2 -^+* Y = 2-"+*(F) (7) 

u,v,x,y 



with 


F{u, V, X, y) = h{v) - f{u) + g{y) + AyB{z' + 
+ 4:x{z + + 2 ;' + + yA). 


where a = (o^i,..., am), and 0 a = 

0 ai ( 8 ) • • • ( 8 ) <pam • Note that 0 a are stabilizer states and 
for a given index a one can compute the standard form 
of 0a as defined in Eq. (§ in time 0{n). Denoting 


Note that F{u^ v, x, y) is a degree-two polynomial in 
/c + m + n + t variables. By Lemmaone can compute 
the sum (F) in time 0(/c + m + t + n)^ = 0{n^). Also, 
Lemmaj^and Eq. Q implies that ('0|n|0) takes values 
for some integer q and j G Zg. 

Consider now a PBC on n qubits as defined in Sec¬ 
tion |V| Let t be some fixed time step. Recall that a 
sequence of measurement outcomes ai,..., at is ob¬ 
served with the probability 


n = 2-‘n(/ + afePfe) 

k=l 

we get 

Pr(( 7 i, ... ,crt) = ^Cb{(t)a\Ti\(pb)- (10) 

a,b=l 


Pr(ai, ...,at) = (P®"| 1[{1/2){I + cTfePfe)|P®"). 

k=l 


Below we shall construct an algorithm that takes as 
input a step t, a sequence of outcomes ai,..., at and 
returns Pr(ai,..., at). It allows us to compute Pr(ai) 
and get a sample of ai by flipping a coin with a prop¬ 
erly chosen bias. By calling the algorithm twice one 
can also compute conditional probabilities 


Pr(crt|cri, . . . ,CTt_i) 


Pr(cri,...,o-t) 

Pr(cri,...,(Tt_i)‘ 


The discussion above implies that each term 
( 0 a|n| 0 b) can be computed exactly in time O(n^). 
Assuming that arithmetic operations with complex 
numbers have a unit cost (see Remark 1 below), the 
probability Pr(ai,..., a^) can be computed in time 

Let us now show an explicit decomposition Eq. 
with k = 6 and X = 7. This gives an algorithm 
for computing Pr(ai,..., a^) with a running time 
Q ( 772 / 3 ^ 3 ) ig enough to prove Theorem It 

will be more convenient to normalize the magic state 
such that 


Thus, for fixed variables ai,...,at-i one can get a 
sample of at by computing the conditional proba¬ 
bility Pr(at|ai,..., at-i) and flipping a coin with a 
properly chosen bias. The ability to sample the out¬ 
comes ai,..., a^ from the distribution Pr(ai,..., a^) 
is equivalent to simulating the PBC classically. Hence 
it suffices to construct an algorithm that computes 
Pr((Ti,...,(T(). 

Suppose we are given some integers k,x = 0(1) and 
a decomposition 

= ( 8 ) 

a=l 

where 0 a are /c-qubit stabilizer states and Ca are com¬ 
plex coefficients. Suppose also that n = mk for some 
integer m. Taking the m-fold tensor power of Eq. 
one gets 

x^ 

= (9) 

a=l I 


\H) = |0) + t|l), t = tan (tt/S) = V2 — 1. 

Let Bn = F 2 be the set of all n-bit strings and Bn^k Z 
Bn be the subset of strings with the Hamming weight 
exactly k. Let Bn = U On, where En and On 
are the subsets of even-weight and odd-weight strings 
respectively. Given a set of bit strings F, we shall 
write \S) = Y.:ces \x) for the uniform superposition of 
all strings in F. Eor example, |F^,o) = |F^,n) = 

and = ^]^=Qt^\Bn,k)- Define also a 

state 

\Kn) = E \x) = n 

x£Bn i<j 

Note that \Bn^o'), iBn^n}, lEn}, I On) ; and nre 

stabilizer states as defined by Eq. Define also a 
pair of graphs G' = (E',F') and 0'^ = {V",E") with 
six vertices shown on Eig. The desired stabilizer 
decomposition of is 


\H^^) = (-16 + 12\/2)|B6.o) + (96 - 68V2)\Be^e) + (10 - 7V2)\Ee) + (-14 + 10\/2)|O6) 

+(7 - 5V2)Z^^\Ke) + (10 - 7V2)|((.') + (10 - (11) 


10 ')= n HZ)i,j\Oe) and 10 ") = n HZkjlOe). 


where 
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This completes the proof of Theorem The numer¬ 
ical method used to find the above decomposition is 
discussed in Appendix B. We conjecture that k = 6 is 
the smallest integer such that < 2^, see Section]^ 
Accordingly, k = 6 is likely to be the smallest integer 
for which the the above simulation strategy outper¬ 
forms the brute-force simulation algorithm. 

Remark 1: Let us point out that all coefficients in 
Eq. (11) belong to the ring Z[\/2] = {p + V2q p^q ^ 
Z} known the ring of quadratic integers with a base 
two. Hence the coefficients in E also belong to 
Z[\/2]. Using Eq. a and Lemma 1^ we conclude that 
each term in Eq. (10) has a form 2~^rjuj^ for some 
integer 0 < g < n, some r] G Z[\/2], and some j G Zg. 
Multiplying Eq. ( [1Q| ) by a suitable power of two we 
can assume that each term in Eq. (10) has a form 
Of + iyd where a,/3 e Z[v^] (of course we can ignore 
the imaginary part i/3 since Pr(cri,..., at) is a real 
number). Thus computing the sum in Eq. ( [1Q| ) only 
requires arithmetic operations in the ring Z[\72]. 

Remark 2: One can notice that the first five terms 
in Eq. (11) are stabilizer states symmetric under all 
permutations of qubits. On the other hand, the states 
(j)' and (/)" break the permutation symmetry. Inter¬ 
estingly, we found that the state \H®^) does not be¬ 
long to the subspace spanned by symmetric stabilizer 
states of six qubits. Thus any stabilizer decomposi¬ 
tion of \H^^) must use at least two non-symmetric 
states. On the other hand, one can check that \H®^) 
belongs to the subspace spanned by symmetric stabi¬ 
lizer states for n < 5. The best decompositions that 
we were able to find for n < 5 are formed by symmet¬ 
ric stabilizer states, see Appendix B. 


V. ADDING VIRTUAL QUBITS TO A PBC 


In this section we prove Theorems HH We begin 
with Theorem Recall that we consider a quan¬ 
tum circuit U on n qubits in the CliffordH-T ba¬ 
sis which contains m T-gates. We assume that all 
qubits are initialized in the |0) state. Each qubit is 
finally measured in the 0,1 basis. Let us first define 
a more general version of PBC called PBC* where 
some subset of qubits can be initialized in the |0) 
state. Apart from that, definitions of PBC and PBC* 
are the same. Eirst we will show that U can be ef¬ 
ficiently simulated by PBC* on n + m qubits with 
the initial state C) Indeed, replace each 

T-gate of U by the gadget shown on Eig. This 
gadget uses one ancillary qubit prepared in the magic 
state |T) ~ |0) + e*^/^|l). The latter is equivalent to 
\H) modulo Clifford gates, |T) = e^^l^HS^\H). Let 
= (a|0)+yd|l) be the input state for the gadget. Let 
ai and a 2 be the measured eigenvalues oi ZZ and IX 
operators, see Eig. One can check that the gadget 


outputs a state 'ipai,a 2 where 

|^+_) ~ ZT|^), 

IV-— 

Eurt her more, all four measurement outcomes are 
equally likely. Applying a correcting Clifford op¬ 
erator I, Z, S, ZS for the measurement outcomes 

++,H—, —h,-respectively, one gets the desired T 

gate. Let U' be the circuit obtained from U by replac¬ 
ing each T gate with the gadget as above. The final 
measurement of n qubits in the 0,1 basis is equiva¬ 
lent to a non-destructive eigenvalue measurement of 
Zi,...,Z^ after which the final state is discarded. 
This allows one to commute all Clifford gates of U' 
towards the end of the circuit by properly updating 
which Pauli operator one has to be measured at each 
step. Once a Clifford gate reaches the end of the cir¬ 
cuit, it serves no purpose and can be discarded. We 
conclude that U can be simulated byaPBC* onn+m 
qubits. Let Pi,... ,P^ G P’^+’^ be the Pauli opera¬ 
tors that have to be measured. We can assume that 
all Pauli operators Pi,..., P^ pairwise commute. In¬ 
deed, suppose this is not the case and let t be the first 
time step when Pt anti-commutes with P^ for some 
s < t. Let (/) be the state reached just before the 
measurement of Pt. Note that Pgcj) = Pcj) and thus 
(/ P(7tPt)(j) = (a^Pt ±Ps)(/). One can easily check that 
an operator V = {(JtPt i Ps)/V^ is a Clifford uni¬ 
tary operator whenever Pt and Pg anticommute. This 
shows that both outcomes at have the same proba¬ 
bility and the measurement of Pt has the same effect 
as drawing at from the uniform distribution and ap¬ 
plying the Clifford unitary V defined above. Such a 
unitary V can be commuted towards the end of the 
circuit and discarded. Hence we can assume that all 
operators Pi,..., P^ pairwise commute. Eurthermore, 
one can append the sequence Pi,..., P^ at the be¬ 
ginning with dummy Pauli measurements of Zi for 
all qubits i initialized in the |0) state. Applying the 
above argument again one can modify the sequence 
Pi,..., P^ such that all Pt commute with the dummy 
measurements, that is, any operator Pt acts trivially 
on the qubits initialized in the |0) state. Therefore 
such qubits serve no purpose and can be discarded. 
We have shown that the original circuit U can be sim¬ 
ulated by a PBC on m qubits with r steps and pairwise 
commuting Pauli operators Pi,..., P^. Eurthermore, 
since the number of independent pairwise commuting 
Pauli operators on m qubits is at most m, we can as¬ 
sume that r < that is, the PBC has the standard 
form. This completes the proof of Theorem 

Let us now prove TheoremLet Q be a fixed PBC 
on n k qubits and let p{Q) be the probability that 
the final outcome of Q is bout = 1- Our goal is to 
approximate p{Q) on a classical computer assisted by 
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FIG. 3. Graphs G' and G" used in the dehnition of stabilizer states cj)' and 0'', 


see Eq. (11). 


in 


T) 



out 


Using the Monte Carlo method one can estimate 
p{Q) with a constant precision by generating M = 
min {1, 0(cr^)} independent samples of Thus the 
overall cost of adding k virtual qubits is 

C ~ xmax I 

V i=l 


|T) ~ | 0 > + e*’^/'‘|l) 


FIG. 4. Implementation of the T-gate. 


It remains to choose a decomposition in Eq. ( 12 ). One 
can decompose each copy of \H){H\ as a linear com¬ 
bination of stabilizer states using the identity 

l^)(^l = <^l|0)(0| + <^2|1)(1| + <^3|+)(+|5 (15) 


a PBC on n qubits. Suppose one can find a decompo¬ 
sition 

( 12 ) 

i=l 

for some /c-qubit stabilizer states <pi and some real 
coefficients ai. By linearity, one has 

X 

p{Q) = '^aip{Qi), (13) 

i=l 

where Qi is a PBC-type computation obtained from Q 
by initializing the first k qubits in the state (j)i rather 
than \H)®^. We note that any stabilizer state can 
be represented as \(j)i) = Ui\0)^^ for some Clifford 
unitary Ui. Commuting Ui towards the end of Qi 
and properly updating which Pauli operator has to 
be measured at each step we can assume that | 0 i) = 
|0)<8)^ for all i. As we have already showed above, such 
computation Qi is equivalent to a PBC on n qubits. 
Let bi be the output bit of Qi such that E( 6 i) = p{Qi). 
Define a random variable 

X 

i=l 


where 


1 1-V2 1 

“ 1 - 2 ’ “ 3-71 

and then take the tensor product decomposition. 
Thus X = 3^ and C ^ x = This completes 

the proof of Theorem 


APPENDIX A 


In this section we prove Lemma Since the con¬ 
stant term /0 contributes a multiplicative factor uo'^ 
to (/), we can assume wlog that /0 = 0. Define coef¬ 
ficients gi,... ,gn G Z 2 such that 



0 if /a = 1 (mod 4), 

1 if /a = 3 (mod 4), 

fa/2 if fa = 0,2 (mod 4). 


Let S C [n] be the set of indexes a such that fa = 
1,3 (mod 4). A simple algebra shows that 

^fix) ^ . (^_lY{x) ^ 


The above shows that ^ is an unbiased estimator of 
p{Q) and one can generate a sample of ^ by repeating 
a PBC on n qubits x times. Since all variables bi are 
independent, the variance of ^ is bounded as 

= (14) 

i=l 


where 

n 

9{x) = '^gaXa+ E] fa,bXaXb- 

a=l l<a<b<n 

Let US first assume that S' 7 ^ 0. Without loss of gener¬ 
ality S 3 n (otherwise permute the variables). Define 
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a new summation variable ^ G F 2 such that Ua = Xa 
for a = 1,..., n — 1 and yn = '^aes Note that 

j ya '-d a = I,...,n-I, 

X ^ \ 

I 2/n + EaeS\n Va if « = n. 


Obviously, S'g = 0 if L'^{z) includes at least one of the 
variables Za with 2 r < a < n — 1 . Otherwise one gets 

r 

Se = n 5e,a, 

a=l 


Using the identity 


where 


jJ2aeS^a — (mod 2) ^ 

one arrives at 

(/) = ^ 

ySFJ 

with 

Hv) = QaVa + {9a + 9n)ya + 5n2/n 

a^S aES\n 

n—1 

+ X/ fa,byayb + '^ fa^nVayn 

l<a<b<n—l a=l 

n—1 

^ ^ ^ ^ fa^nVayb- 
0=1beSXn 


5e,a = E (-1)'" 

'22 a— 1) 22 a =0 51 


122 a+ 0 ( 6 ,a)2;2a-l+'f^(e50)2;2a 


for some coefficients u{e^a) = 0,1 and v{e,a) = 0,1 
determined by L'. A direct inspection shows that Se,a 
takes values 0 and ± 2 . We conclude that takes 
values 0 and ±2’^“^“’^. This leaves only nine possible 
combinations for (/) = S'q + Namely, (/) = 0 (if 
both So and Si are zero), or (/) = for 

some m G Z 4 (if exactly one of So and Si is non¬ 
zero), or (/) = for some m G Z 4 (if 

both So and Si are non-zero). This is equivalent to 
the statement of Lemma The case when S' = 0 is 
completely analogous. 


APPENDIX B 


Let us split the sum over y into two terms correspond¬ 
ing to = O 5 1- We get 

(/) = So + iSi, 


where 


Se= Y. e = 0,l. 

Using the definition of h{y) one gets 

h(z, S) ^ ^ H(i i)Z(iZi) -|- L^i^z^ 

l<a<b<n—l 


where Le{z) is a linear Boolean function and iL is a 
symmetric binary matrix with zero diagonal. Impor¬ 
tantly, the matrix H does not depend on e. It is well- 
known that any matrix H as above can be transformed 
into a block-diagonal form with all non-zero blocks be- 

by a transformation H U^iLU, where 


mg 


0 1 
1 0 


V is an invertible binary matrix [32]. The number of 
non-zero blocks in V^HV is r, where 2 r is the rank 
of H (which is always even). Moreover, the matrix 

V can be computed in time 0{n^) using the standard 
linear algebra methods [32|. Performing a change of 
variable z ^ Vz and defining new linear functions 
L'^{z) = L^iyz) one gets 




z GIF ■ 


-1 


(^_l)Ea = l 22a-l22a+^e(^), 


In this section we describe a numerical method for 
computing a low-rank decomposition of a given tar¬ 
get state (j) into stabilizer states. We shall be mostly 
interested in the case \(j)) = 

Let Sn be the set of pure n-qubit stabilizer states. 
Given a target n-qubit state (j) and an integer y we 
would like to check whether 0 admits a decomposition 

1'?^) = 

a=l 


for some 0i,..., G 5^. It is known [19] that the 
size of Sn grows asymptotically as Thus 

performing an exhaustive search over all y-tuples of 
n-qubit stabilizer states becomes impractical even for 
small values of n. Instead, we used a Monte Carlo al¬ 
gorithm that performs a random walk on the set of y- 
tuples (01,..., 0^) G 5^ and tries to maximize a suit¬ 
able objective function F(0i,..., 0^). Specifically, we 
choose F(0i,..., 0^) = ||II0||, where 11 is the projec¬ 
tor onto the linear subspace spanned by 0i,..., 0y. 
Assuming that ||0|| = 1, the decomposition Eq. (16) 
is possible iff maxF(0i,..., 0^) = 1. 

We define the random walk on S^ using the Glauber 
dynamics. Let 0 > 0 be some fixed parameter which 
has a meaning of the inverse temperature. At each 
step of the walk we randomly choose a state label a G 
{1, 2,..., y} and a Pauli operator P G V^. All choices 
are made with respect to the uniform distribution. We 
perform a tentative move 0^ ^ 0^^ = c{I + P)(l>a^ 
where c is a normalizing coefficient. One can easily 





12 


check that this move maps stabilizer states to stabi¬ 
lizer states. If the move increases the value of the ob¬ 
jective function F, we accept the new state that is, 
(j)a is replaced by Otherwise, the new state is 
accepted with a probability Pacc = exp [—^{F — F')], 
where F and F' are the values of the objective func¬ 
tion before and after the move. If (/ -j- F)^^ = 0, the 
move is rejected right away. The walk is stopped as 
long as we observe a tuple of states with F = 1. We 
start with relatively small values (3 = Pin and gradu¬ 
ally increase P using the geometric sequence until it 
reaches the final value P = Pf. This corresponds to 
the simulated annealing method. For each value of 
P the random walk was repeated for M ^ 1 steps. 
In practice we used values Pin = Pf = 4000, and 
M = 1000. The number of annealing steps was chosen 
as 100. Since we worked with relatively small values of 


n, the stabilizer states pj were represented by vectors 
of size 2^. 


Since our target state \(j)) = \H®^) has real ampli¬ 
tudes in the computational basis, one can easily show 
that the optimal decomposition Eq. (16) can be chosen 
such that all stabilizer states pa have real amplitudes 
as well (the real part of a stabilizer state is either zero 
or proportional to a stabilizer state). Accordingly, we 
restricted the random walk to the subset of cor¬ 


responding to real stabilizer states. Clearly, a move 
pj p'j = c{I + P)Pj maps real states to real states 
if P contains even number of T’s. The move was ac¬ 
cepted only if this condition is satisfied. 


The best decompositions of \H^^) found using this 
method a re sh own below. Here we use the notations 
so that \H) = |0) -1- {\/2 — 1)|1). 


of Section 


IV 


\H^^) = (2 - V2)\E2) + (-1 + V2)\K2). 

= (-8 + 6V2)|B3,3) + (2 - V2)\E3) + (-1 + V2)\K3). 


\H^^) = (4 - 272)154,0) + (20 - 1472)|54,4) + (-4 + 372)|04) + (-3 + 2V2)Z^^\K4). 


|5®3) = (-16 + 1272)|55,o) + (-40 + 2872) jEg, 5 ) + (-4 + 372)IO 5 ) + (10 - 772) 
+ (3 - 272 ) 5105 ) + (7 - 572)5|55). 


Here K = Yli^j A{Z)ij applies cnotrolled-Z to each 
pair of qubits. The stabilizer decomposition of \H^^) 
is shown in Eq. (11). By definition, the number of 
terms y in these decompositions gives an upper bound 
on the stabilizer rank Xn- We conjecture that all above 
decompositions and the one in Eq. (11) are optimal in 
the sense that X — Xn- 


APPENDIX C 

Let Xn be the stabilizer rank of Here we 

prove a lower bound Xn = 

Let 0 be a pure n-qubit state. Define the T-count of 
p denoted r{p) as the minimum integer r such that p 
can be prepared starting from the all-zeros state by a 
quantum circuit composed of Clifford gates, T-gates, 
and (postselective) eigenvalue measurements of Pauli 
operators, such that the number of T-gates is at most 
r. We claim that 


XtW > (17) 


Indeed, as was shown in Section the T-gate can 


be realized by a gadget that consumes one copy of 
the magic state \H) and performs (postselective) Pauli 
measurements. Thus we can prepare p starting from 
t{P) copies of the magic state |iL) by a sequence of 
(postselective) Pauli measurement and Clifford opera¬ 
tions. Since the latter do not increase stabilizer rank, 
we can write 0 as a linear combination of Xr((/)) stabi¬ 
lizer states. This is equivalent to Eq. 0- 

We shall now choose a state p will a relatively small 
T-count and a large stabilizer rank. Define 

7n) = |6'i 06»20- • •06'„), |6»fe) = |0) + (2''+i-l)|l). 

Lemma 4. The state pn has 2^ distinet amplitudes 
in the eomputational basis. 

We postpone the proof of the lemma until the end 
of the section. Let us first show that pn has a large 
stabilizer rank. Indeed, any stabilizer state has C = 
0(1) distinct amplitudes in the computational basis. 
Thus any linear combination of y stabilizer states has 
at most distinct amplitudes. Applying this to pn 
one gets > 2^^, that is, x{Pn) = ^{n). 

Let us now show that pn has a small T-count. Eirst 
we claim that the state Ok has T-count 0{k). Indeed, 
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we can first prepare a state (g) |0) and then 

apply multiple control CNOT gate such that 

the last qubit is the target one. This creates a state 

|a;i, a;2,..., Xk+i) ® \x1X2 • • • Xk+i). 

cce{o,i}^+i 

Measuring the first /c + 1 qubits in the X-basis and 
postselecting the outcome + leaves the last qubit in a 
state (2^+^ — 1)|0) + |1), which coincides with Ok mod¬ 
ulo a bit-flip. One can easily check that the multi¬ 
ple control CNOT gate can be implemented 

using 0{k) Toffoli gates. Furthermore, the Toffoli 
gate can be implemented using seven T-gates j33l[34] . 
Thus Ok has T-count 0{k) and therefore has T- 
count = 0{in?). Substituting this into 

Eq. (17) yields x ^2 > that is, Xn = 


Proof of Lemma [H Consider any basis vector x G 
F 2 . Let X C {1, 2,..., n} be the support of x. Then 


{x\4>n) = n (2' 

keK 


/c+1 


1). 


The lemma follows from the following fact. 

Proposition 1. Suppose K^M [2,oc) are finite 
subsets of integers sueh that 


n ( 2 " - 1 ) = n ( 2 ™ - !)• 

keK meM 

Then K = M. 

Proof. First we claim that 

P(l-2-'’) > 1-2"“+^ for all a >1. 

b>a 

Indeed, define x = 2~^. Then 

n(l-2-'')-i = n(l-a;2-'’)-i 


(18) 


(19) 


b>a 


b>0 

P 


=i+E^"n(i-2"")“'- 

p=l q=l 

Define fp = n:=i(i — 2 ^). One can easily check that 
^p > limp^oo fp > 1/4. Since = 1/2, one gets 

00 00 

P(l_2-f')-i < l+2a;+4^a;P < ^(2a;)P = (l-2a;) 


b>a 


p=2 


p=0 


Now let s{K) and s{M) be the sum of all elements 
in K and M respectively. Assume wlog that s{K) > 
s{M). Then Eq. implies 


2 s(k)-s(m) ^ nmGM(l ^ 

n).GK(i-2-^) 


< 


< 


Ylk>,{l-2-^) 1-2 


-1 


= 2 . 


Here the last inequality follows from Eq. (19). Thus 
s{K) = s{M) and 

n (1 - 2 -^=) = n (1 - 2 -™)=^. ( 20 ) 

keK meM 

Let ki and mi be the smallest elements of K and M 
respectively. Assume wlog that mi > ki. Let us show 
that in fact mi = ki. Indeed, otherwise mi > ki 1. 
Then Eq. (20) implies ^ < 1 — 2~^^ and 


C>(l-2“’”i) P (1-2“'') > (1-2 “’"i)2. 

b>mi-\-l 


Here the last inequality follows from Eq. (19). Thus 
(1 — < 1 — 2“^i which implies mi < ki P 1 

leading to a contradiction. We conclude that ki = 
mi. Thus we can cancel the factor (1 — 2~^^) in both 
parts of Eq. (20) and use induction in the number of 


elements in the largest of the sets K, M to show that 
K = M. □ 


□ 


Finally, let us sketch an argument that could po¬ 
tentially provide a stronger lower bound on Xn- Con¬ 
sider a decomposition where 

(pa are normalized stabilizer states. We can assume 
wlog that Pa are linearly independent. Define a vector 
c = (ci,... ,c^) and a Gram matrix Ga^g = {palPg)- 
Then (c|G|c) = 1. Let gmin > 0 be the smallest eigen¬ 
value of G. Then G > gmini and thus ||c|p < 

Let be the largest magnitude of the overlap between 
and a normalized n-qubit stabilizer state. One 
can easily check that Sn < The identity 1 = 

Ea=iCa(</>a|'f^®”) implies 1 < (5„||c||i < 

We conclude that % > gmin^n'^ > This 

proves that x > in the special case when all 

states pa are pairwise orthogonal, that is, gmin = 1- 


-1 


This is equivalent to Eq. (19). 
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